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I. INTRODUCTION 


The last decade has seen the general acceptance of the importance 
of syntax description and syntactical analysis in the development of, 
and support for, new automatic programming languages for digital com- 
puters. 

Since the introduction of the stored program computer, communication 
between man and his computer has been a problem. The laborious and time 
consuming numeric coding of machine language inhibited many potential 
computer users from learning how to communicate directly with the com- 
puter. The man with a computer solvable problem often found manual 
calculations easier than trying to communicate with either the expert 
programmer or the machine. A means of communication with the computer, 
more easily learned than machine language, was needed. This led to the 
development of problem-oriented languages. 

Problem-oriented languages like ALGOL, PL/1, and FORTRAN, were 
designed to facilitate communication between user and computer for 
solutions to mathematical, and other special interest problems. 

These automatic programming languages generally consisted of a 
vocabulary which incorporated many key words from a profession or an 
interest area. The user could instruct the computer in a language 
similar to ordinary English usage. For example, instead of trying 
to manipulate an array by laboriously coding a sequence of Peetkaeeen 
to achieve the transpose of a matrix, the user could achieve the same 
goal by the use of a special reserved word, such as "TRANSPOSE". It 


should be noted, however, that these special reserved words, such as 


"TRANS POSE", have a specific, or nonredundant meaning. The programmer 
must use these words in accordance with the coding restrictions of the 
particular language in which he is programming. 

The advent of the automatic programming languages attracted more 
users to the computer. These new users, as they discovered more 
applications for the computer, requested more languages specifically 
designed for their particular interests. Computer specialists 
responded by developing more new languages. 

The introduction of time sharing systems, with many terminals, 
magnified the problem of satisfying many users and their programming 
language requirements. Whereas previously only the large corporations 
could afford a computer installation, now many small businesses were 
able to share a central installation. The over-all effect provided 
a large body of users, solving computer problems with a wide variety 
of special purpose languages. 

The cost of these time sharing systems is distributed among the 
users. Each user is charged for the amount of computer processor 
time used by his program. Obviously, the user is interested in 
minimizing his processor time, thus decreasing costs and increasing 
profits. However, many of the users of time sharing systems are not 
completely familiar with the restrictions of the particular language 
in use. As a result, they use the computer to "trouble shoot" their 
programs for syntax errors. An experienced programmer finds that 
programs seldom compile correctly on the first attempt. Yet com- 
pilation of entire programs is attempted on each occasion with the 
associated consumption of expensive processor time. The need is 
apparent for a tool that will assist the programmer in eliminating 


syntax errors, while minimizing expensive processor time. 
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A by-product of reducing the number of attempts to compile is 
the additional processor time available. This time can be used to 
either improve response time to the current users, or to provide 
service to new users. 

Thus the objective of this thesis: to provide a basic universal 
syntax checking program which could be expanded for use in a time- 
sharing environment. This program is hereafter referred to as the 
Universal Syntax Checker. This syntax checking program accepts the 
description of all languages and provides a syntax check of programs 
written in the described languages, as shown in Figure I1-l. The 
syntax checker requires less facility resources than a language 
processor, generates no code, and is intended for use in a time- 
sharing environment where the various users may time-share the 
universal syntax checker regardless of the language in use at each 


terminal. 
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Figure 1l-l Universal Syntax Checker 
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Il. DEFINITIONS 


A. SYNTAX AND SEMANTICS 

In order to understand the operation of the universal syntax checker, 
a knowledge of the methods used to formulate and describe languages is 
necessary. 

There is a lack of standard uniform notation throughout the lLitera- 
ture, therefore the definitions and notation from [1] and [2] are 
selected for use in this discussion. 

The problem of describing a language involves a unique difficulty. 
During the discussion, a distinction must be made between the language 
being described and the describing language. If the language being 
described is called the "language", then the language in terms of 
which the description is being made is called the 'metalanguage". 

If these two languages are not distinguishable, much ambiguity and 
imprecision results. A discussion of how languages are described is 
in order. 

In the exposition of automatic programming languages, the 
descriptions themselves can be classified into two types; syntactic 
definitions and semantic discussions. The syntactic definitions make 
explicit what structures are to be meaningful in the language. These 
definitions are concerned with the proper construction of words, 
expressions, and statements. The semantic discussions are concerned 
with the meanings to be attributed to the various structures and their 


proper usage in the language. 


it 


It often happens that in the process of describing a new computer 
language, the formats for statements are presented in terms of the natural 
language. These formats define which components are necessary and how 
they are to be put together to form meaningful statements. These are 
the syntax and semantics of the language. 

The syntax of a FORTRAN DO statement can be given as the natural 
language definition: 


DO n i=mMm 


? where, 


Ms m5 
n is a statement identifier, 

i is a simple integer, 

are simple integer variables. 


and m m 


mesh ae 


The syntax definition, however, is not sufficient to specify a 
properly formed and meaningful DO statement. It is also necessary 
to discuss the semantics of the DO statement. The semantics could 
be given as follows: 

V(m,) > v(m) = O..and v(m) =O 
where V(m, ) is the value at execution of the variable or constant m.- 
It is also necessary that n refer to a statement not previously defined 
in the subprogram [1]. 

In general, the syntax and semantics of an entire language, (i.e., 
of each statement in the language), are given in order to specify the 
proper construction of meaningful statements in that language. 

To formalize the definitions in the metalanguage, each definition 
is given the form of a statement or construct, sometimes called a 


production [3]. Thus the syntactic definition of an "unsigned integer" 


Coulda be: 
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An unsigned integer can be formed by writing 
avdie1€sor 
an unsigned integer followed by a digit. 
This definition is also an example of a recursive definition. That 
is, the term "unsigned integer'’ is defined in terms of itself. 
Syntactic definitions can frequently be shortened by using a 


metalanguage symbolism. The following symbols will be used: 


SYMBOL INTERPRETATION 
<x > angular brackets; a syntactic category, the 


object named x. 


oss "can be written as" or, "is defined as". 


tl tf 


reads as ‘or 

To repeat the recursive definition of an unsigned integer using 
the above notation: 

< unsigned integer > ::= < digit > 
< unsigned integer >< digit > 

This method of syntactic definition of a language is called "'specifica- 
tion by Backus-normal form," and abbreviated CennEnGe 

The formalism for the semantic discussion of a particular language 
is not readily available or apparent. Although attempts have been 
made to make this formalization [1], it is not of concern here since 


the universal syntax checker verifies the syntax, not the semantics 


of a particular language. 


PAcuatays the form is not normal, B.N.F. is often called Backus-Naur 
form. Naur introduced the actual notation, and Backus introduced the 
eoneepe [2]: 
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B. GRAMMAR AND LANGUAGE 

As mentioned earlier, the universal syntax checker accepts the 
definition of a language and programs written in the language. 

The second input to the checker, (a program written in the 
described language), necessitates a discussion of grammar. 


Mention of the word "grammar" brings to mind many thoughts. Most 





people relate the word to something learned in school, or think of a 
set of words and rules which describe some language. Webster defines 
grammar in the following manner: 


The science treating of the classes of words, their 
inflections, and their syntactical relations. 


Most of us recall disecting and diagramming sentences to learn the 
syntactical relation of the parts of a sentence. This is known as 
parsing. For example, the sentence "The little girl talks fast.", 


may be disected after a syntactic definition of a small subset of 


English. 
1 < sentence > ::= < noun phrase > < verb phrase > 
2 < noun phrase > $7]. Sjadjective — < noun phrase > 
3 < adjective > < singular noun > 
4 < verb phrase > ::= < singular verb > < adverb > 
5 < adjective > a=) the | 
6 little 
7 “=< singular feu —s72—= ciel 
8 < singular verb > = talks 
9 < adverb > = fast 


Figure 2-1. Constructs of a sentence 
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Through the application of the syntax rules the sentence is parsed. 
The results, when diagrammed, yield a syntax tree. Figure 2-2 shows 


the diagram of this particular syntax tree. 


™ 


- seneence 
ee ee - 


ges” 


ee 
re 


< noun phrase > < verb phrase > 
a = a 
2 vi : | 


mS 
< noun phrase > 


i a 
a 


~ adjective > | a < singular verb > < adverb > 


Ne 


Ps 


< adjective > < singular noun > | 


The Pett le eae dL exit mae 
Figure 2-2. Diagram of the sentence 
"The little girl talks fast." 

The application of the rules in Figure 2-1 to the sentence "The 
little girl talks fast.'' reveals that the sentence is grammatically 
correct. One should observe that sentences cannot only be tested 
but may also be generated by the application of the same rules. Thus 
application of rule 1 followed by as many applications of rule 2 as 
desired, and so on until no further application of rules is possible, 
would also yield a grammatically correct sentence. The sentence might 
not mean anything, but it would be grammatically correct. The semantics 
of a language prevent improper statement formulation. For example, 
the agallaesc den of rule 1, followed by the application ar tulesZ twice , 
would yield the sentence, "The the little girl talks fast," which is 
grammatically correct, but nonsensical just the same. 


To formalize the grammar presented, four ingredients were necessary. 


First, the ingredient of syntactic categories, the noun phrase, adjective, 
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etce., from which strings of words were derived. These syntactic 
categories are called non-terminal symbols. Second, there were the 
words from which the sentences could be constructed. These objects 
are called terminal symbols. Third, the relations between the various 
strings of terminal and non-terminal symbols which are called pro- 


ductions. These productions are the rules of syntax, as shown in 





Figure 2-1. And last, there was one distinguished non-terminal 
symbol, which appeared nowhere on the right of some production rule. 


This distinguished symbol is referred to as the start symbol [4], or 


the goal symbol [2]. The non-terminal symbol "sentence" was the goal 


symbol in the example given. 


Thus a grammar G may be defined as we V P, Z). The symbols 


en 


Ve Vio P, and Z represent in order, the set of non-terminal symbols, 


the set of terminal symbols, a set of productions, and the goal symbol. 


Let Vee Lier V* be finite concatenations of symbols from Vag: Ves 
and Vey union We respectively. These concatenations are called strings. 


* 


Let :t@ represent the application of a finite sequence of pro- 


ductions from G. 8 « vies is saidwto be a ‘‘'derivat ion sorcaee a a: 


Qi Hh 


and only if there exists a sequence of productions such that @ 

Let G be the subset of English defined earlier. Letting a = 
< sentence >, 8B = "The little girl talks fast.", the parsing tree 
of Figure 2-2 shows that < sentence > om "The little girls talks 
fast." 


Ww), 4a 


Oy Ie 


A language L(G) is defined as (wlw is in V* and Z 


That 168 4 Sttane ic insk(G) Lf An@moniy 14 
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i) The string consists of a finite string of terminal symbols, 
and 
2. the string can be derived from Z by the application of a 
finite sequence of productions from the grammar G. 
The set of finite strings of terminal symbols defined by a grammar G 
is called the set of final sentential forms. 

Now the additional input to the universal syntax checker may be 
firmly identified. The statement, "and a program written in the 
described language,'' means that the syntax checker is provided a 
string (program) which is a member of Vi. 

Thus to summarize, the universal syntax checker receives the 
following two inputs: 

1. A description of a language Lin B.N.F. (The productions, 
non-terminal symbols, and terminal symbols.) 

2. A finite string of terminal symbols from the described 
language. (An element of Wesoy 

A formal description of the universal syntax checker is now 
possible. 

Given the B.N.F. of a language L, and a finite string of terminal 
symbols from the described language, Vo the universal syntax checker 
will determine whether that string is, or is not, a member of L(G) for 


the language L defined. 


C. RELATED RESEARCH 

The use of "syntax-directed" techniques is not a new one. This 
technique has been used in constructing natural language translators 
[5], compilers [6,7], automatic programming language translators [8, 


9, 10], and context-free grammar recognizers [11]. 
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Syntax directed analysis of natural languages is an unsolved problem 
due primarily to the inability to precisely define the syntax of each 
language. An excellent survey of the techniques employed can be four 
in Bobrow [5]. 

Griffiths and Petrick, [12] describe many types of recognition 
procedures, all syntax-directed, for context-free grammars. They 
employ Turing Machine algorithms to highlight the various methods. 
Turing machines were used to preserve clarity, conciseness, and to 
allow comparisons of procedures on the same level of complexity. The 
universal syntax checker more closely approaches the selective top to 
bottom (STB) method than any other described. 

Irons [6], develops a syntax-directed ALGOL 60 compiler. The 
arrangement of various tables to contain the specifications of the 
language are very similar to those employed in the universal syntax 
checker. Each specifies entries for all syntactic units, as integers, 
in the described language with a one-to-one correspondence between the 
actual symbol vector and the integer representation of that description. 
The two methods do differ in that Irons uses a bottom-to-top method. 
Thus, his need for a precedence matrix to ensure that the longest 
string possible is accepted. The syntax checker requires reordering 
of the B.N.F. to present the longest string possible before any 
SHOLTEGCE HSitrine. 

Feldman and Gries discuss the pushdown stack method [3], as one 
of the two ways to achieve a parsing process. This same method is 
an integral part of the "'cellar principle" used to design a syntax 
controlled generator of formal language processors [13]. The cellar 


principle is based on sequential processing of the input symbols. 
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The syntax checker also employs a sequential processing of symbols and 
the recursive procedures available in PL/1 incorporate the pushdown 
stack implicitly for all variables. 

Barnett and Futrelle presents an account of the SHADOW language 
that is used to describe a syntax of a language and an associated 
subroutine which parses an input string [14]. The method requires 
that a mnemonic argument, in addition to the input string, be provided 
to the SHADOW subroutine. These arguments include the names of arrays 
which contain the string and the syntax. Thus, if a parse of a rational 
fraction is desired the mnemonic RATFRN is a required input argument. 
The appearance of this mnemonic, requesting a syntactic analysis, 
causes the subroutine to use the most recently read requested pattern 
name and input string. This system is obviously unsuited for the time- 
sharing environment due to the requirement that the programmer be 
familiar with the syntax of the language in use. 

Unger presents a Global Parser (GP), for phrase structured 
grammars [11]. The method he employs is also very close to the method 
used in the universal syntax checker. Some major differences do exist. 
The primary difference is his use of a set of routines for determining 
possible prefixes and suffixes of N-derivable strings, finding the 
minimum length of such strings, and sub-strings that can never appear 
in N-derivable strings. Another difference is the method of Sonnaeson 
between the production and input string. Unger claims that by matching 
all the terminal symbols of the intermediate string against the input 
string and constantly partitioning the input string, a broad class of 
checks can be made to terminate fruitless paths quickly. This parser 


will not handle cyclic definitions. 


We 


An error correcting parse algorithm is described by Irons utilizing 
a syntax-directed scheme [15]. The algorithm provides two services. 
First, the algorithm provides a parse of strings written in the language 
described. Second, if an incorrect string is presented, the algorieim 
will make substitutions, insertions, and deletions to make the object 
string syntactically correct. The approach is different from the 
syntax checker in that all possible parses are carried along until 
one can be determined to be correct. Backtracking and recursive pro- 
ductions are not allowed. Recursive productions are replaced by a 
similar powerful definition, which allows iterations in pairs of 
symbols. 

Most authors agree that there are several advantages and dis- 
advantages to syntax-directed procedures. Among the advantages are 
the simplicity of compiler construction and the ability to change the 
specifications of a particular language. In addition languages may be 
switched by merely changing the contents of the syntax table. Further, 
the syntax-directed parser can take into account as large a context as 
is required to perform the parsing. The disadvantages include the 
difficulty in trying to specify the syntax of a language, the fact 
that syntax-directed compilers contribute little toward the generation 
of optimum code, and generally poor error analysis and recovery. Error 


type determination is nearly impossible. 


D. PARSING METHOD 
It has been shown that taking a string of symbols and a grammar, 
and constructing a derivation of the string to form its syntax tree, 


is called parsing, recognizing, or analyzing. 
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There are two basic types of parsing methods: top-down and bottom-up. 
The bottom-up method will not be covered but is explained in [3]. The 
top-down parser gets its name from the fact that it is goal oriented. 
The top-down parser starts with the most global production (goal symbol) 
and works its way down the productions, attempting to match the input 
string. Each method can be further qualified as left-right or right- 
left, depending on the order of processing the symbols. 

In order to discuss the manner employed by the syntax checker to 
accomplish its assigned tasks, consider the following example of a 
language and a method to parse such a Inaguage: 

1. Grammar 


Given the following grammar; 


a7 < a Se 
<a>::= <b>| 
<a>t+<b> 
<b>::= <c>| 
2 py See 
— CSS Aly 


(<a>) 
Non-terminal symbols: Z, a, b, c. 
Terminal symbols: soe ie eats en) 
2. Derivations 
To discuss ambiguity it is necessary first to discuss 
derivations. Observe that the following string has more than one 
derivation. Thus, <c >+<b>-+-< c > can be derived by two sets 


of productions: 
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<Z> =| <a> | == aS boa bee ee 
(= sc + S£ bDF i= < ec >=) >a < C Ae 
<4 =|<a>|{ sie <art+<eb>i:e<art+<cb>-=<ee 
= <b> +< b> -<ece> =<c>+< bD > = =aer 


3. Syntax Trees 


The construction of the syntax tree is now shown; 


=< La 


relay 


<— 4 > + <b 
as els 
| | | 


Although there may be more than one set of productions 
which yields the same string of symbols, their syntax-trees are the 
same. Ambiguity exists only when one or more strings have more than 
one syntax tree. 

Observe that the syntax tree shown below would result if the 
Strine. <5¢ o= - —abe : <c>, not a member of L(G), is parsed by 


proceeding left at every junction down the tree (applying the left- 


most derivation). 


<— 7 
ines 
< ! > 
os 
| 
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4. Procedure for a top-down left-right parser 


To acquire familiarity with the top-down left-right parser, 


consider the following procedure. The top-down left-right parser makes 


an initial assumption that the string presented is a valid final 


sentential form, 
step 
step 


SCep 


step 


step 


step 


step 


and then proceeds as follows: 


be 


ae 


Establish goal symbol, in this case Z. 

Apply a production to the goal or subgoal. 
First symbol of resultant string a terminal? 
Yes, continue; No, go to step 6. 

Terminal symbol and input string match? 

Yes, continue; No, go to step 7. 

More symbols in string of production? 

Yes, continue; No, done. 

Establish subgoal and go to step 2. 

An alternate, (OR), production? 


Yes, go to step 6; No, done. 


A slightly modified form of the above procedure will recognize 


grammars with left, right, and self-embedded recursion. In addition, 


ambiguous grammars are processed such that when two or more syntax 


trees are available, the first, as determined by its placement in the 


B.N.F., will be recognized. 


This simplified procedure has been provided to acquaint the 


reader with the general procedure of top-down left-right parsing. 
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ITI. SUPPORT ROUTINES 


A. DISCUSSION 

The previous chapters presented the schema of the universal syntax 
checker. Included was an explanation of the input requirements of the 
system as well as a simplified method to accomplish the syntax checking 
task. 

Before a detailed discussion of the actual parsing algorithm can 
begin, it is necessary to describe certain grammar manipulations and 


tables required to support the algorithm. 


B. TABLES 

The manner in which the grammar is placed into the supporting 
tables is now presented. Note that the identifiers enclosed in the 
parentheses immediately following the table titles, are the actual 
identifiers which were used in the coding of the universal syntax 
checker program. 

eee Ba. tae alec mer } 

This table contains the definitions of the B.N.F. in the form 
< identifier > ::= < letter > | < identifier > < letter >, and appears 
in the table as shown in Figure 3-1, where i is the number of symbols 
in the production and j is the number of productions describing the 
language. The example is taken from the first grammar listed in the 


computer program listing. 
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1 2 3 ees) 


1 IDENTIFIER LETTER 





2 IDENTIFIER 


‘ 


5 LETTER A 





: a | eee ne 


-_.. Pisure o-1. The B.N.F. table. 

2. Symbol Table (SYMTAB) 

The symbol table is constructed from the grammar placed in 
the B.N.F. table. This symbol table contains all the elements of ve 
followed by all the elements of Mee The first entry in the table is 
the empty string. 

Since no terminal symbol may appear as the left part of a pro- 
duction, the non-terminal symbols were located by examining the 
first column of the B.N.F. table and applying the following logic: 

step 1. Is the symbol being examined listed in the 

symbol table? 

Yes, examine next symbol; No, go to step 2. 
step 2. Put the symbol in the symbol table, increment 

the table pointer, and examine the next symbol. 

After noting the location of the last non-terminal symbol, the 
terminal symbols are placed into the table. Since no terminal 
symbol may appear on the left of a production, these symbols are 


determined using the logic as before, excluding the first column 
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and examining the remaining columns of the B.N.F. table. Referring to 
the grammar shown in Figure 3-1, Figure 3-2 shows the symbol table upon 
completion of its construction. The identifier, nsymbols, refers to 
the total number of symbols in the grammar. 

3. Onright Table (ONRIGHT 

There is a one-to-one correspondence between the subscript 

numbers of the symbol table and the subscript numbers of the onright 
table. 


SYMTAB 





IDENTIFIER 






nsymbols last symbol) 


7. Figure 3-2. The Symbol table. 
Thus, ONRIGHT (i) is marked "true'' if and only if the symbol in 
SYMTAB (i) appears in the right part of some production. A search 
of the "false'' entries in the onright table produces the goal symbol. 
More than one ''false" entry indicates that the goal symbol is not 


unique and therefore the grammar is unacceptable. Figure 3-3 depicts 


the onright table for an acceptable grammar. 
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ONRIGHT 





Figure 3-3. The Onright table. 
4, Production Table (PR 

There is a one-to-one correspondence between the production 
table (PR) and B.N.F. table (P). Also, there exists an onto mapping 
from the entries in the production table to the subscript numbers of 
the symbol table. Thus, wherever the symbol in SYMTAB (i) appears 
in the B.N.F. table, the entry "i'' is made in the production table. 
For example, consider the symbol table in Figure 3-2. Wherever the 
symbol IDENTIFIER appears in the B.N.F. table shown in Figure 3-1, 
an entry "2'' is made in the production table as shown in Figure 3-4. 


Note the recursive production occurring in the second row. 





Figure 3-4. The Production table. 
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IV. THE PARSING ALGORITHM 


A. BACKGROUND 

The inputs, general parsing procedure, and support tables of the 
universal syntax checker have been described. A presentation of the 
actual programming implementation remains. This implementation was 
accomplished utilizing the recursive procedures available in the PL/1 
programming language. The operation of the major procedure, called 
RECOGNIZE, is dependent upon a symbol buffer and accumulator. The 
buffer contains a portion of the input string. The accumulator 
provides symbol back-up and is the heart of the procedure NEXTSYM. 
These two procedures will be discussed briefly before a description 


of the algorithm is given in reference ALGOL [16]. 


B. PROCEDURES 
1. Nextsym 
The NEXTSYM procedure centers around the accumulator shown 
in Figure 4-1. Each element of the accumulator contains an entry 
"i" corresponding to SYMTAB (i) for each symbol recognized in the 
buffer. 

Associated with the accumulator are three pointers: ap, tap, 
and acclen. The accumulator pointer, ap, points to the symbol being 
examined. The temporary accumulator pointer, tap, retains the value 
of the accumulator pointer at each junction in the syntax tree that 


the procedure examines. The accumulator length pointer, acclen, 


retains the total number of accumulator positions filled. 
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Accumulator 





Figure 4-1. Accumulator 

The accumulator will hold the first fifty symbols from the input 
String. Thereafter, it will contain the most recent forty to fifty 
symbols. The variable "offset" serves to adjust the subscript number 
i of the accumulator while allowing the accumulator pointer to increase 
sequentially. 

2. Recognize 

The RECOGNIZE procedure is a top-down left-right slow-back 

parser. The method, as described in Chapter II, has been implemented 
with one exception: the parser attempts to recognize the left-most 
derivation first. The B.N.F. presented to the universal syntax checker 
must be modified so that the left-most derivation is the longest string 


possible (see Appendix A). 


C. THE SYNTAX CHECKING ALGORITHM 

The description of the syntax checking algorithm follows (refer- 
ence ALGOL). The ALGOL version of the algorithm has not been run on 
a computer, although the PL/1 version presented in the computer pro- 
gram section has been tested, and the results are in the computer 


Sue put Section. 
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procedure PAEN: 


integer bp, ap, acclenmmorisemee 1, jek, ope, nonterm, 
nsymbols, bl, linecount, target, scan, a; 
array Symtab [l:nsymbols], Buffer [1:80], Accum [0:50], 
PR [tones | 
boolean array Onright [1l:nsymbols]; 
boolean change, empty, done, check; 

comment This procedure is a top-down left-right slow-back parser. 
To recognize a final sentential form of the B.N.F. presented to 
the procedure, the distinguished symbol is established as the 
goal symbol, and productions are applied, where valid, in the 
order presented. Once a terminal symbol is encountered, the 
symbol is matched with the next symbol in the input string. 
If the match is successful, an attempt is made to match the 
next symbol of the string, and so on. If a match is not 
found, then an alternate production is attempted in an effort 
to recognize the symbol in the buffer. On completion of the 
procedure, the input string will be declared syntactically 
correct or incorrect; 

procedure LOOKUP (k); 


value k; integer k; 


béeganetor j := £ uneibenpeeds 


if PR(j,1) =k then 
LOCKE 45a; 
corto EM end: 


EXIT: end LOOKUP; 
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boolean procedure NEXTSYM (k); 
value k; integer k; 

begin comment This procedure controls the accumulator and checks 
for recognition of the symbols in the buffer with the results 
of the application of production rules; 

Hieecer 1, Mm; 

i := (ap + 1l)e= offset ; 
if i > acclen then begin 


for 1 


s¢@ 


i while (true) 
if i = 50 then begin 
Ob ie acctomes | Step lt until i do begin 
accum [j] := SCAN; 
if check then write ( * symtab [accum[j]]); end; 
acclen :=i; 
if accum [i] =k then begin 
ap :S= apt 1; 
NEXIS = true; end 
else NEXTSYM := false; 
go to A; end 
eucesbertawOLiset: 7> orkset a 10, 


begin for m := 1 until 40 do 


accum [m] accum [m + 10]; end- 


i :=i - 10; end 


else if i = 1 then begin 


write (' depth of search exceeded'); 





go to A; end 


else begin 


Hl 


if accum [i] = k then begin 
ap :=ap t+ 1; 
NEXTSYt := Ceue gee md 
else NEXTSYM := false; 
end ; 
A ende NE RE Srik 
boolean procedure RECOGNIZE (production, element) ; 
value production, element; 
iNceter spEOduCe1On element. 
begin comment This procedure attempts to recognize each of the 
symbols in the input string. If a symbol is recognized then 
RECOGNIZE is set to true, otherwise false; 
Inbeget Kawi ebay. 
Ens :=-0; 
ieee heme mer = a) Teleeioy (eel ae i jeue 


Were Wetec il loeiel [press uil 
if PR [production, 1] = PR [production, 2) then 
if —“ RECOGNIZE (production, 3) then begin 
api a sEcp:, 
RECOGNIZE := if PR [production, 1] = PR [production + 1, 1] 
then RECOGNIZE (production + 1, 1) 
elscmea tse. 
eo to Cur, 
end 
else RECOGNIZE :;= true 
else if RECOGNIZE (production, 2) then begin 


tap := ap; 
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rE check thensbearn 
write (symbol found) ; 
for production := production while (PR [production, 1] = 
PR [production + 1, 1] “AX PR [production + l, 1] # 
PReilpreductiton + 1, 2) } 
production := production + l 
else end; 
if Ir # 0 then begin 
write (symbol, ‘number of left recursive 
production found'); 
ap san wea, 
RECOGNIZE := true; 
end 
else RECOGNIZE := false 
go to OUT 
EAD = ap; 
RECOGNIZE := if PR [production, 1] = PR [production 
+ 1, 1) AV PR [production + 1, 1] # 
PR [eeoductton + Zz) then 
RECOGNIZE (production + 1, 1) 
else false; 
go to OUT; 
end 
k := PR [production, element]; 


RECOGNIZE := if k # 1 then 
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if k > nonterm then 
RECOGNIZE := if NEXTSYM (k) then 
RECOGNIZE (production, 
element + 1) 
else false 
else 
RECOGNIZE := if RECOGNIZE (LOOKUP (k), 1) 
then RECOGNIZE (production, 
element + 1) 
else false 
else true; 
end 
else RECOGNIZE := true; 
OUT : 
end RECOGNIZE; 
if ——\ done then 
begin if RECOGNIZE (1, 1) then write ( 'syntax ok' ) 
else write ( ‘syntax error' ); k := 0; 
for k := k while (k # nsymbols /\ —done) do k := scan; 
bp := 72; ap := acclen := offset := linecount := 0; 


end MAIN: 
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V. CONCLUSIONS 


A. USES 


The algorithm presented has many applications. However, one 
application, more suitable than the others, is use in a time-sharing 
environment. 

In a time-sharing system, the universal syntax checker could reside 
on a direct access storage device, along with the B.N.F. definitions, 
to be called when desired. Each terminal user would be able to time- 
share this syntax checker regardless of the language used. The syntax 
checker would provide a very rapid syntax check for each user at each 
terminal. As soon as the first syntax error was encountered, syntax 
checking of that program would end. The user at an on-line terminal 
would then examine the incorrect statement and effect a correction. 
The syntax check would restart the program analysis. Figure 5-1 


depicts a possible configuration for a time sharing system with 


syntax checker. 


TERMINALS DASD 
KKKKKE KUKKKEKKEKRE 
- & ; z 
AAKKRK K, % Syntax & 

% * Checker * 

x * * 

4K KRKKEKEKKER 
KKK ac 
* KKEKKKKERERERERERERERER Vc ¥ 
* sake RE ER RRE REE RE RER f * * 
Pieris 
gRREKEREREK OMPUTER & 

rs: RRREREARE RAR ERERER ER 

aa 
KKKKKE ¢ 
te ee sé 
* * x 
KRENEK et 

ZF 
Kk oP 
* We 
* % 
KKKKKK 

Figure 5-1. Time-sharing system with syntax checker. 
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B. FURTHER RESEARCH AND IMPROVEMENTS 

One obvious need is the actual implementation of the universal 
syntax checker in a time-sharing system. 

Two major improvements are also needed. First, a recovery pro- 
cedure which will allowrecovery to a logical restart point, to continue 
Syntax checking, after encountering a syntax error. Second, there is 
a need for a more complete and precise set of diagnostic messages. 
Simply to say "syntax ok" or "syntax error" is not enough. Precise 


statements of the form where, what, and why are needed. 
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APPENDIX A 


Backus-normal form requirements. 


Maximum number of characters per symbol is ten. 

Maximum number of symbols per production is eight. 

Maximum number of productions per B.N.F. is three hundred. 

left recursive productions must include the trivial production 
prior to the recursive production. 

Cyclic productions require the trivial productions as in 4, 

The B.N.F. must be ordered to present the longest string possible 
prior to any other production. 

The character string $PROGRAM is not allowed in a production for 
a language. This string is used as an indicator to make the end 
of the B.N.F. and the start of each program submitted for parsing. 
Place the characters $PROGRAM immediately after the B.N.F. sub- 
mitted and immediately after each program submitted for syntax 


checking. 
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